IN THE CLAIMS: 

Please amend the claims as follows: 

1. (Canceled) 

2. (Canceled) 

3. (Canceled) 

4. (Canceled) 

5. (Canceled) 

6. (Canceled) 

7. (Canceled) 

8. (Canceled) 

9. (Canceled) 

10. (Canceled) 

11. (Amended) A machine executable program for use in a voice query recognition system that is 

distributed across a client system and a separate server system, the program comprising: 
a first audio signal receiving routine for receiving user speech utterance signals representing speech 
utterances to be recognized during a sequence of speech utterance evaluation time frames, said 
speech utterances including sentences comprised of one or more words; and 

a first signal processing routine adapted to generate representative speech data values for 
each speech utterance evaluation time frame during which speech utterance signals are received, said 
representative speech data values including a set of compressed mel-frequency cepstral coefficients 
(MFCQ : 

a formatting routine for rendering said representative speech data values into a transmission format 
suitable for transmission from the client system over a communications channel to a second 
processing routine executing on the server computing system; and 

wherein said representative speech data values are transmitted continuously during said 
speech utterances within streaming packets and without waiting for silence to be detected and/ or 
said speech utterances to be completed; 

further wherein said representative speech data values constitute a minimum amount of 
information that can be used by said second processing routine to complete accurate recognition of 
said one or more words and said sentences. 
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12. (Original) The program of claim 11, wherein said program works within a browser program 
executing on said computing system as part of a client-server based system. 



1 3. (Amended) The program of claim 1 1 , wherein each of said representative 3pccch data values 
corresponds to said set of compressed MFCCs is generated at a rate corresponding to at least 100 
frames per second, and such that said set of compressed MFCCs includes a separate cepstral 
coefficient value for a corresponding frequency component of said user speech utterance signals, 
and said first data content corresponds to a set of said frequency components spanning an audible 
speech frequency range. 

14. (Amended) The program of claim [13] H, wherein said additional data content corresponds 
te including a set of delta and acceleration coefficients is computed from a corresponding said set of 
compressed MFCCs said cepstral coefficient values at either said client system or said server 
computing system on a connection by connection basis based on an evaluation of computing 
resources available at such client system and said server computing system . 

15. (Amended) The program of claim 11, wherein said second processing routine is configured 
with an amount of resources by said server computing system based on a bandwidth and 
transmission speed associated with a transmission link between said server computing system and 
said client system so that said second processing routine performs accurate recognition of said one 
or more words with less a first latency that is less than a second latency that would result would that 
resulting if said one or more words were recognized by said first signal processing routine and then 
transmitted over said transmission link . 

16. (Canceled) 

17. (Canceled) 

18. (Canceled) 

19. (Canceled) 

20. (Canceled) 

21. (Canceled) 

22. (Canceled) 
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23. (Canceled) 

24. (Canceled) 

25. (Canceled) 

26. (Canceled)' 

27. (Amended) A system for assisting a client computing device to perform speech recognition in 
cooperation with a server computing device, the system comprising: 

a speech utterance capture circuit for receiving a speech utterance and generating associated speech 
utterance signals, where said speech utterance can include an articulated sentence of one or more 
articulated words; and 

a speech utterance signal processing circuit, said signal processing being configurable to perform 
data extracting operations on said speech utterance signals to generate a set of frequency related 
^speech utterance signals for said articulated sentence; and 

wherein said set of frequency related speech utterance signals include a set of compressed 
mel-frequency cepstral coefficients (MFCC); 

a transmission circuit for coding said set of frequency related speech utterance signals into a format 
suitable for transmission over a communications channel to the server; 

a receiving circuit for receiving a response to said articulated sentence through said 
communications channel from the server, said response being generated by said server using said set 
of frequency related speech utterance signals to perform a word recognition operation on said one 
or more articulated words and a sentence recognition operation on said articulated sentence; and 

wherein a latency associated with performing said speech recognition is minimized by optimizing 
an allocation of signal processing responsibilities for said speech utterance signals between the client 
computing device and the server computing device on a case-by-case basis in accordance with signal 
processing capabilities of the client computing device . 

i 
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28. (Amended) A system for assisting a client computing device to perform real-time speech 
recognition in cooperation with a server computing device, the system comprising: 

a sound processing circuit integrated within the client computing device, said sound processing 
circuit being adapted to receive a continuous speech utterance and to generate associated speech 
utterance signals therefrom, wherein said speech utterance can include an articulated sentence of 
one or more articulated words; and 

a first signal processing routine adapted to be executed by the client computing device, and which 
first signal processing routine is further adapted to continuously generate a set of speech-based 
vector coefficients as needed from said speech utterance signals; and 

a transmission circuit coupled to the client computing device for coding said set of speech based 
vector coefficients into a format suitable for transmission over a communications channel to the 
server, said set of speech-based vector coefficients being continuously transmitted in real-time 
within a Hypertext Transport Protocol (HTTP) byte stream as said speech utterances occur; 

a receiving circuit coupled to the client computing device for receiving a real-time response to said 
articulated sentence through said communications channel from the server; 

wherein said response is generated by said server substantially on a real-time basis using said set of 
speech based vector coefficients to perform a second signal processing routine which completes a 
word recognition operation on said one or more articulated words, as well as a sentence recognition 
operation on said articulated sentence; 

further wherein at least some words are recognized in real-time before said speech utterance is 
completed . 
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29. (Amended) A distributed speech recognition system for processing a speech utterance 
comprising: 

a first signal processing circuit associated with a client computing system, said first signal 
processing circuit being adapted to generate a first set of speech data values from speech utterance 
signals, wherein said first set of speech data values have a limited data content and are compressed 
without quantization to reduce processing and transmission latencies in the distributed speech 
recognition system; 

a second signal processing circuit associated with a separate server computing system, said 
second signal processing circuit being configured to generate a second set of speech data values 
derived from said first set of speech data values, and being further configured to generate a 
combined speech data value set consisting of said second set of speech data values and said first set 
of data values; 

a word recognition circuit adapted to use said combined speech data value set and for 
generating recognizing words in the speech utterance, said word recognition circuit being configured 
to recognize words before said speech utterance is finished. 

30. (Original) The system of claim 29, further including a sentence recognition circuit which 
recognizes an articulated sentence containing said recognized words. 

31. (Original) The system of claim 30, wherein said articulated sentence can include one of a 
number of predefined sentences recognizable by said system, and said articulated sentence is 
recognized by identifying a candidate set of potential sentences from said number of predefined 
sentences corresponding to said articulated sentence, and then comparing each entry in the 
candidate set of potential sentences to said articulate sentence to determine a matching 
recognized sentence. 

32. (Original) The system of claim 31, wherein said articulated sentence is processed by a natural 
language engine operating on said recognized words. 

33. (Amended) The system of claim 32, wherein said articulated sentence is compared against said 
candidate set of potential sentences by examining noun phrases including noun phrases 
consisting of multiple words . 

34. (Original) The system of claim 31, wherein said candidate set of potential sentences are 
determined in part by a context dictionary loaded by said sentence recognition circuit in 
response to an operating environment presented by said system to a user. 
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35. (Canceled) 

36. (Canceled) 

37. (Canceled) 

38. (Canceled) 

39. (Canceled) 

40. (Canceled) 

41. (Canceled) 

42. (Canceled) 

43. (Canceled) 

44. (Canceled) 

45. (Canceled) 



46. (Original) The system of claim 11, further wherein said second processing means performs a 
query for determining which of said one or more words correspond to said one or more text words. 

47. (Canceled) 

48. (Canceled) 

49. (Canceled) 

50. (Canceled) 

51. (Canceled) 

52. (Canceled) 
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53. (Amended) A method of performing distributed voice recognition comprising the steps of: 

(a) receiving user speech utterance signals representing speech utterances to be recognized 
during a sequence of speech utterance evaluation time frames, said speech utterances 
including sentences comprised of one or more words; and 

(b) generating representative speech data values with a first processing circuit for each speech 
utterance evaluation time frame during which speech utterance signals are received, said 
representative speech data values including a set of compressed mel-frequency cepstral 
coefficients (MFCQ : 

(c) encoding said representative speech data values into a transmission format suitable for 
transmission over a communications channel to a second processing circuit; and 

further wherein said representative speech data values constitute a minimum amount of 
information that can be used by said second processing circuit to complete accurate 
recognition of said one or more words and said sentences. 

54. (Original) The method of claim 53, wherein said recognition of said one or more words 
occurs in real-time. 

55. (Original) The method of claim 53, wherein each of said representative 3pccch data values 
corresponds to said set of compressed MFCCs is generated at a rate corresponding to at least 100 
frames per second, and such that said set of compressed MFCCs includes a separate cepstral 
coefficient value for a corresponding frequency component of said user speech utterance signals, 
and said first data content corresponds to a set of said frequency components spanning an audible 
speech frequency range. 

56. (Amended) The method of claim 55, wherein a set of delta and acceleration coefficients 
are computed from said cepstral coefficient values to complete recognition of said one or more 
words and said sentences, wherein such set of delta and acceleration coefficients are computed at 
either said first processing circuit or said second processing circuit on a connection by connection 
basis based on an evaluation of computing resources available at such respective processing circuits. 
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57. (Amended) The method of claim 53, wherein said second processing circuit is configured with 
an amount of resources by a server computing system based on a bandwidth and transmission speed 
associated with a transmission link between said server computing system and a client system 
associated with the first processing circuit, so that said second processing circuit performs accurate 
recognition of said one or more words with less a first latency that is less than a second latency that 
would result would that resulting if said one or more words were recogni2ed by said first processing 
circuit and then transmitted over said transmission link . 

58. (Amended) A method of performing distributed speech recognition using a first computing 
device and a second computing device, the method comprising the steps of: 

(a) evaluating speech processing capabilities of the first computing device using an 
initialization routine; and 

(b) evaluating a transmission latency of a communications channel coupling the first 
computing device and the second computing device; and 

(c) allocating speech processing tasks between the first computing device and the 
second computing device based on results of steps (a) and (b). such that an overall 
speech recognition process is customized on a case-by-case basis for performance 
characteristics of the first computing device and the second computing device; and 

(d) receiving a speech utterance at the first computing device; and 

(e) generating associated speech utterance signals from said speech utterance with the 
first computing device; and 

(f) generate a first set of speech data values from said speech utterance signals at the 
first computing device, said first set of speech data values being insufficient by themselves 
for permitting recognition of words articulated in said speech utterance; and 

(g) formatting said first set of speech data values at the first computing device to be 
compatible with a communications protocol used by a said communications channel 
coupled to the first computing device ; 

(h) transmitting said first set of speech data values through said channel to the second 
computing device; and 

(i) generating a second set of speech data values based on said speech data values, such 
that second set of speech data values contain sufficient information to be usable by a word 
recognition engine for recognizing words in said speech utterance. 
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59. (Original) The method of claim 58, wherein said second set of speech data values include 
said first set of speech data values and a derived set of speech data values, which derived set of 
speech data values are computed based on said first speech data values. 

60. (Original) The method of claim 58, wherein said second set of speech data values can be 
generated by said second computing device in a time that is less than the combination of a first time 
which would be required by said first computing device to generate said second set of speech data 
values from said first set of speech data values combined with a second time which would be 
required to format and transmit said second set of speech data values. 

61 . (Original) The method of claim 58, wherein signal processing responsibilities of said first 
and second computing devices are allocated such that said first computing device performs less than 
approximately Vz the required signal processing operations needed to convert said speech utterance 
signals into a form usable by a word recognition engine. 

62. (Amended) The method of claim 58, wherein signal processing functions said speech processing 
tasks performed by said first and second computing devices are configured further allocated based 
on: (i) computing resources available to said first and 3ccond computing devices; and (ii) 
transmission latencies of said communications channel transmission speed capabilities of a 
transceiver coupled to said first computing device . 

63. (Original) The method of claim 58, wherein said first processing device is also configured 
to assist said second processing device with signal processing computations required to generate said 
second set of speech data values. 

64. (Original) The method of claim 58, wherein said first set of speech data values represent 
the least amount of data that can used by said second processing device to generate said second set 
of data values usable for a word recognition process. 
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65. (Amended) A method of performing distributed recognition of a speech utterance comprising 
the steps of: 

(a) generating a first set of speech data values from speech utterance signals at a first computing 
system, wherein said first set of speech data values have a limited data content to reduce 
processing and transmission latencies; and 

wherein said first set of speech data values include a set of compressed mel-frequency 
cepstral coefficients (MFCC); 

(b) generating a second set of speech data values derived from said first set of speech data 
values at a second computing system, said second computing system being independendy 
operable from said first computing system; and 

(c) generating a combined speech data value set at said second computing system consisting of 
said second set of speech data values and said first set of data values; 

(d) generating a list of recognized words in said speech utterance, said list being generated at 
least in part before said speech utterance is finished . 

66. (Original) The method of claim 65, further including a step: (e) recognizing an articulated 
sentence containing said list of recognized words. 

67. (Original) The method of claim 66, wherein said articulated sentence can include one of a 
number of predefined recognizable sentences, and said articulated sentence is recognized by 
identifying a candidate set of potential sentences from said number of predefined sentences 
corresponding to said articulated sentence, and then comparing each entry in the candidate set of 
potential sentences to said articulate sentence to determine a matching recognized sentence. 

68. (Original) The method of claim 67, wherein said articulated sentence is processed by a natural 
language engine operating on said recognized words. 

69. (Original) The method of claim 68, wherein said articulated sentence is compared against said 
candidate set of potential sentences by examining noun phrases. 

70. (Original) The method of claim 67, wherein said candidate set of potential sentences are 
determined in part by a context dictionary loaded in response to an operating environment 
presented to a user articulating said sentence. 
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